Post-vaccine epidemiology of serotype 3 pneumococci identifies transformation inhibition through prophage-driven alteration of a non-coding RNA

Background The respiratory pathogen Streptococcus pneumoniae (the pneumococcus) is a genetically diverse bacterium associated with over 101 immunologically distinct polysaccharide capsules (serotypes). Polysaccharide conjugate vaccines (PCVs) have successfully eliminated multiple targeted serotypes, yet the mucoid serotype 3 has persisted despite its inclusion in PCV13. This capsule type is predominantly associated with a single globally disseminated strain, GPSC12 (clonal complex 180). Methods A genomic epidemiology study combined previous surveillance datasets of serotype 3 pneumococci to analyse the population structure, dynamics, and differences in rates of diversification within GPSC12 during the period of PCV introductions. Transcriptomic analyses, whole genome sequencing, mutagenesis, and electron microscopy were used to characterise the phenotypic impact of loci hypothesised to affect this strain’s evolution. Results GPSC12 was split into clades by a genomic analysis. Clade I, the most common, rarely underwent transformation, but was typically infected with the prophage ϕOXC141. Prior to the introduction of PCV13, this clade’s composition shifted towards a ϕOXC141-negative subpopulation in a systematically sampled UK collection. In the post-PCV13 era, more rapidly recombining non-Clade I isolates, also ϕOXC141-negative, have risen in prevalence. The low in vitro transformation efficiency of a Clade I isolate could not be fully explained by the ~100-fold reduction attributable to the serotype 3 capsule. Accordingly, prophage ϕOXC141 was found to modify csRNA3, a non-coding RNA that inhibits the induction of transformation. This alteration was identified in ~30% of all pneumococci and was particularly common in the unusually clonal serotype 1 GPSC2 strain. RNA-seq and quantitative reverse transcriptase PCR experiments using a genetically tractable pneumococcus demonstrated the altered csRNA3 was more effective at inhibiting production of the competence-stimulating peptide pheromone. This resulted in a reduction in the induction of competence for transformation. Conclusion This interference with the quorum sensing needed to induce competence reduces the risk of the prophage being deleted by homologous recombination. Hence the selfish prophage-driven alteration of a regulatory RNA limits cell-cell communication and horizontal gene transfer, complicating the interpretation of post-vaccine population dynamics. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-022-01147-2.


Figure S2
Overview of the serotype 3 population. (A) Neighbour-joining tree calculated from core genome distances estimated using PopPUNK across the full serotype 3 dataset of 971 isolates. The columns show the assignment of GPSC12 isolates to clades in Croucher et al (2012); Azarian et al (2016), which was subsequently extended by Groves et al (2019); and this work. (B) Recombination-corrected maximum likelihood phylogeny of 891 GPSC12 isolates. The assignment to clades is shown as in panel A. (C) Bar chart showing the assignment of 971 serotype 3 isolates to GPSCs using PopPUNK and version 6 of the GPS strain database. The majority (891) isolates belonged to GPSC12.

Figure S5
Phylodynamic analysis of Clade I with BactDating. (A) Root-to-tip distance analysis of Clade I. The left panel shows the Clade I subtree, extracted from the Gubbins analysis of GPSC12, rooted to maximise the correlation between the date at which isolates were sampled and their root-to-tip distance. Leaf nodes are coloured according to their sampling date: blue indicates bacteria collected relatively early, and red indicates bacteria collected relatively recently. The right panel shows the correlation between the sampling date and the root-to-tip distances, using the same colouring. The linear model fit shows the molecular clock rate, and the point at which the line intercepts the horizontal axis indicates the estimated date of Clade I's origin. (B) Markov Chain Monte Carlo sampling of the BactDating model parameters. These demonstrate the convergence of the algorithm on a stable set of parameter estimates.

Figure S6
Inferring the distribution of ϕOXC141-like prophage. The 891 GPSC12 isolates were aligned to the 34,080 bp sequence of ϕOXC141, extracted from S. pneumoniae OXC141 (accession code FQ312027). The density plot shows the size distribution of the longest BLASTN alignment to the query sequence from each isolate. The red dashed line shows the threshold (12.5 kb) used to distinguish alignments to ϕOXC141-like prophage from less specific matches to more divergent prophage.

Figure S7
Inferring the status of the attBOXC site. The 891 GPSC12 isolates were aligned to a 4,477 bp sequence spanning the unmodified attBOXC insertion site within the S. pneumoniae TIGR4 genome (accession code AE005672). The density plot shows the size distribution of the longest BLASTN alignment to the query sequence from each isolate. The red dashed line shows the threshold (3,750 bp) used to distinguish the longer, unmodified attBOXC sites from those that are shorter, and therefore likely to have been disrupted by a prophage insertion.

Figure S8
Identifying the distribution of prophage ϕOXC141 across the GPS collection. (A) The 34,080 bp sequence of ϕOXC141 was extracted from S. pneumoniae OXC141 (accession code FQ312027). The density plot shows the distribution of sizes for the longest BLASTN alignment to each GPS isolate. The red dashed line at 12.5 kb represents the threshold separating alignments to full-length ϕOXC141-like prophage from matches to other phage sequences. (B) Distribution of ϕOXC141-like prophage between GPSCs using the 20,047 isolates in the GPS collection. GPSCs with an index above 50 are merged to enable visualisation of the prophage's enrichment in GPSC12. Evidence of ϕOXC141-like prophage was only identified in five isolates outside of GPSC12. (C) Comparison of ϕOXC141-like prophage in GPSC12 isolates outside of Clade I. The blue arrows represent predicted protein coding sequences. The red bands between prophage link regions of similar sequence, with the level of DNA sequence identity calculated using BLASTN indicated by the key. Each prophage is labelled with the accession code of the isolate in which it was identified, and the clade to which the isolate was assigned. Two prophage did not assemble within a single contig, and therefore the contig breaks are marked on the sequences.

Figure S9
Replacement of the cps locus of S. pneumoniae R6 with that from S. pneumoniae 99-4038 in the recombinant S. pneumoniae cps99-4038. The region encompassed by the recombination inferred to span the cps locus in S. pneumoniae cps99-4038 is aligned to the orthologous region of the parental genotype, R6, with BLASTN. The red bands link regions of similar sequence, with the colour indicating the level of sequence identity between the pair. The cps locus itself is flanked by the dexB and aliA genes. The serotype 3 allele contains two functional genes: cps3D, encoding a UDP-glucose 6-dehydrogenase, and cps3S, encoding the polymerase.

Figure S11
Comparison of the predicted structures of (A) csRNA3 from S. pneumoniae R6 and (B) csRNA3L from S. pneumoniae OXC141. The sequence "CAAUCA" is highlighted in red. (C) Predicted interaction between csRNA3 from S. pneumoniae R6 and the comC transcript from S. pneumoniae OXC141, which has the CSP1 pherotype. The horizontal axis corresponds to the length of the comC mRNA (from the transcription initiation site to the stop codon of the protein coding sequence), and the vertical axis corresponds to the length of csRNA3. The colour shows the minimal energy of the interaction between the sequences at the coordinates specified by the axes in kcal mol -1 , as indicated by the key. (D) Predicted interaction between csRNA3L from S. pneumoniae OXC141 and the comC transcript from S. pneumoniae OXC141, which has the CSP1 pherotype. Data are displayed as described for panel (C). (E) Predicted interaction between csRNA3 from S. pneumoniae R6 and the comC transcript from S. pneumoniae RMV8, which has the CSP2 pherotype. Data are displayed as described for panel (C). (F) Predicted interaction between csRNA3L from S. pneumoniae OXC141 and the comC transcript from S. pneumoniae RMV8, which has the CSP2 pherotype. Data are displayed as described for panel (C).   pneumoniae 99-4038 isolate, and ϕOXC141-negative and ϕOXC141-positive isolates from the third round of the passage. The isolates found to be ϕOXC141-negative by PCR amplification also exhibited higher tolerance of mitomycin C than those that were found to be ϕOXC141positive. This would be expected if the prophage has been deleted, rather than excised.

Figure S18
Modification of ccnC through insertion of ϕRMV8 into attBOXC. The insertion of ϕRMV8, indicated by the pink genes integrated downstream of purA, into the S. pneumoniae RMV8 genome (accession code OX244288) is compared to the unmodified attBOXC site of R6 using BLASTN. The red bands link regions of similar sequence in the two loci, with the colour indicating the level of sequence identity between the pair. The black vertical lines show how the cellular ccnC gene at the attB site is split into the ccnCL and ccnCR genes at the attL and attR sites, respectively. The second set of pink genes, inserted on the right of ϕRMV8, represents a phage-related chromosomal island (PRCI) inserted adjacent to the tadA gene. Figure S19 Differential arrangement of the tvr loci distinguishing RMV8domi and RMV8rare. Each is labelled with the DNA motif they target. The red boxes correspond to the conserved methylase (hsdM, at the 5' end) and endonuclease (hsdR, at the 3' end) genes. The blue boxes correspond to the sequences encoding the target-recognition domains (TRDs) of the specificity subunit. The dark blue boxes correspond to the 5' TRD-encoding sequences, and the light blue boxes correspond to the 3' TRD-encoding sequences. The active specificity subunit gene (hsdS) is formed by the combination of a 5' and 3' TRD-encoding sequence that is closest to the 5' end of the locus. The HsdS protein determines the motif targeted by the system encoded by the tvr locus. The rearrangement of the TRD-encoding sequences is driven by TvrR, a recombinase encoded by a gene represented by the pink box, which is regulated by the products of the tvrAT genes, represented by the grey boxes. The tvrR gene is truncated in these "locked" variants to limit the interconversion between arrangements at the tvr locus. The atypical growth profile is that of S. pneumoniae RMV8rare, which has a growth defect in late exponential phase, relative to other genotypes. The dashed horizontal line at an OD600 of 0.5 corresponds to the cell density at which samples for RNA-seq were taken for each genotype. (B) Growth curves for RMV8rare genotypes. Data are shown as in panel (A), but summarise four replicate experiments per genotype. The growth defect of RMV8rare is eliminated by the replacement of prophage ϕRMV8 with a tetM resistance marker. (C) Assays of the activity of ϕRMV8 using quantitative PCR (qPCR). The same primer arrangements are used as displayed in Additional file 1: Fig. S15. The proportion of phage excised from the chromosome was estimated by qPCR of an amplicon spanning the attL site (using primers A and B), the attB site (using primers A and D), and the attP site (using primers B and C). The ratios of the concentrations of the products of primers A and D and primers A and B were used to quantify the level of prophage excision for the four genotypes analysed by RNA-seq during the late exponential phase (OD600 = 0.5), and after overnight growth. The ratios of the concentrations of the products of primers B and C and the products of primers A and D were used to quantify the level of circularised phage for the same four RMV8 genotypes after overnight growth.

Figure S21
Density plots showing the inferred distribution of fragment sizes from mapping by Kallisto. The consistency of these distributions between samples, each labelled with its accession code (Additional file 5: Table S4), suggests there should not be any bias introduced by variation in sequencing library preparation.  Points are coloured red if the null hypothesis can be rejected at a false discovery rate of 10 -3 , following a Benjamini-Hochberg correction for multiple testing. The Q-Q plot shows this threshold captures the major differences between the genotypes.

Figure S24
Volcano plots contrasting the transcriptional patterns between RMV8domi and the other analysed genotypes, as indicated by the plot titles. The horizontal axis shows the natural logarithm of the fold difference in expression levels between the genotypes, β. The vertical axis shows the negative base 10 logarithm of the q value, calculated using a Benjamini-Hochberg correction. Points are coloured red where this value exceeds the false discovery rate threshold of 10 -3 .

Figure S25
Quantification of the expression of ϕRMV8 lysogeny genes using RNA-seq data. Points are coloured using the scheme described in Fig. 6, as shown in the key. Each plot shows the transcription of a different gene across the four genotypes in scaled reads per base, which represents gene expression as the normalised mean number of reads mapping to the bases in a sequence. These genes tend to be most highly expressed in RMV8rare.

Figure S26
Quantification of the expression of ϕRMV8 replication genes using RNA-seq data. Points are coloured using the scheme described in Fig. 6. Each plot shows the transcription of a different gene, in scaled reads per base, across the four genotypes. These genes tend to be most highly expressed in RMV8rare.

Figure S27
Quantification of the expression of ϕRMV8 structural genes using RNA-seq data.
Points are coloured using the scheme described in Fig. 6. Each plot shows the transcription of a different gene, in scaled reads per base, across the four genotypes. These genes tend to be most highly expressed in RMV8rare.

Figure S28
Quantification of the expression of ϕRMV8 lysis genes using RNA-seq data. Points are coloured using the scheme described in Fig. 6. Each plot shows the transcription of a different gene, in scaled reads per base, across the four genotypes. These genes tend to be most highly expressed in RMV8rare.

Figure S29
Quantification of the expression of early competence genes using RNA-seq data. Points are coloured using the scheme described in Fig. 6, as shown in the key. Each plot shows the transcription of a different gene, in scaled reads per base, across the four genotypes. The comCDE operon was more highly transcribed in RMV8rare tvr::cat than the other genotypes, although this difference was only significant for comC.

Figure S31
Characterisation of RMV8rare ccnC and ccnCL mutants. (A) Growth curves comparing RMV8rare ccnC and ccnCL mutants using three replicates. The solid line represents the median, and the ribbon shows the range between the minimum and maximum. (B) Negative controls for transformation experiments with RMV8rare mutants. These experiments were designed to test whether the rifampicin-resistant colonies isolated in the absence of added CSP were arising through transformation. In one set of experiments, the protocol was undertaken with no exogenous DNA. A maximum of one rifampicin-resistant colony was isolated in each experiment, likely representing the infrequent emergence of this resistance phenotype through spontaneous mutation. Similarly, the protocol was undertaken with exogenous DNA in the presence of DNase I. This again yielded few rifampicin-resistant colonies, demonstrating that the rifampicin-resistant colonies observed in the transformation experiments were primarily generated through uptake of exogenous naked DNA.